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Abstract 

In this paper, we extend the information theoretic framework that was developed in earlier works to multi-hop 
network settings. For a given network, we construct a novel deterministic model that quantifies the ability of the 
network in transmitting private and common messages across users. Based on this model, we formulate a linear 
optimization problem that explores the throughput of a multi-layer network, thereby offering the optimal strategy 
as to what kind of common messages should be generated in the network to maximize the throughput. With this 
deterministic model, we also investigate the role of feedback for multi-layer networks, from which we identify 
a variety of scenarios in which feedback can improve transmission efficiency. Our results provide fundamental 
guidelines as to how to coordinate cooperation between users to enable efficient information exchanges across them. 


Index Terms 

Linear Information Coupling (LIC) Problem, Divergence Transition Matrix (DTM), Kullback-Leibler Divergence 
Approximation, Deterministic Model, Feedback. 

I. Introduction 

With the hooming of internet and mohile communication, communication networks and social networks are rapidly 
growing in size and density. While the global behavior of such a large network depends on actions of individual 
users indeed, the sheer volume of the network makes the effect of an individual action often nonsignificant. For 
instance, in social networks (or stock-market networks), a public opinion (or the growth rate of wealth) is barely 
affected by an individual’s opinion (or investment), although it is formed by their aggregation. 

One natural objective for such large networks is to understand how users should design their local transmission 
strategies to optimize network information flow. To this end, we aim to develop an information-theoretic framework 
that can well model such network phenomena, as well as suggest the optimal transmission strategy of each user. 

Specifically, we consider a discrete memoryless network such that the input/output distributions of each node are 
fixed, and each node wishes to convey information by slightly perturbing the given input distribution. In this network, 
we intend to investigate how a small amount of information can be efficiently conveyed to certain destinations. 
Here the given distributions can be viewed as the global trend of the network, and the low-rate transmission of 
each node can be interpreted a nonsignificant action of an individual user. We employ mutual information in an 
attempt to quantify the amount of perturbation made by the users, as well as the low-rate transmission efficiency. 

By employing the notion of mutual information, earlier works HI ^ 13 > El have made some progress towards 
understanding the optimal transmission strategy of users for certain networks. Specifically, Borade-Zheng HI 
introduced a local geometric approach, based on an approximation of the Kullback-Leibler (KL) divergence, to 
develop a novel information-theoretic framework, and apply the framework to point-to-point channels and certain 
broadcast channels. Abbe-Zheng O employed the local geometric approach to address some interesting open 
questions in Gaussian networks. Huang-Zheng El extended the framework to more general yet single-hop multi¬ 
terminal settings, and coined the linear information coupling (LIC) problems for the associated problems (based 
on the framework) that wifi be reviewed in Section m 

In particular ||3l developed an insightful interpretation. The key observation of El is that under certain local 
assumptions, transmission of different types of messages, such as private and common messages, can be viewed 
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as transmission through separated deterministic links with certain capacities. This viewpoint allows us to quantify 
the difficulty of broadcasting common messages than sending private messages, as well as compute the gain of 
transmitting common messages. This development is particularly useful for multi-hop networks because it serves 
to characterize the trade-off between the gain of sending a common message and the cost that occurs in creating 
the common message from the previous layer. 

In this work, we generalize the development into multi-hop networks, thereby shedding some insights as to what 
kinds of common messages should be created in order to optimize the trade-off. Our contributions are two-fold. The 
first contribution is to extend the information theoretic framework in |[T]|, lU, 131 into multi-hop layered networks. 
Building upon this framework, we construct a deterministic network model that allows us to quantify efficiency 
of fransmiffing differenf fypes of private and common messages in the networks. This deterministic model enables 
us to translate the LIC problems into linear optimization problems, in which the solutions suggest what kind of 
common messages should be generated to optimize the throughput. With this deterministic model, we also develop 
an optimal local strategy for a large-scaled layered network having identical channel parameters for each layer. 
Specifically we demonsfrate fhaf fhe opfimal sfrafegy is composed of a few fundamenfal communication modes (fo 
be specified in Section IV- All . In general, our results provide the insights of how users in a communication network 
should cooperate with each other to increase the efficiency of fransmiffing information fhrough fhe nefwork. 

The second confiibufion of fhis paper is fhaf we further generalize fhe framework info nefworks wifh feedback, 
fhereby exploring fhe role of feedback in mulfi-hop layered networks. Specifically, we consider fhe same layered 
nefworks buf addifionally include feedback links from each node fo fhe nodes of fhe preceding layers. For fhese 
nefworks, we develop fhe besf fransmission sfrafegy of each node fhaf maximizes fransmission efficiency. The key 
fechnique employed here relies on our new developmenf on nefwork equivalence, saying fhaf fhe layer-by-layer 
feedback strategy, which allows feedback only for the nodes in the immediately-preceding layer, yields the same 
performance as in the most idealistic one, where feedback is available to the entire nodes in all the preceding layers. 
Moreover, we identify a variety of network scenarios in which feedback can strictly improve transmission efficiency. 
Our deterministic model allows us to have a deeper understanding on the nature of feedback gain: feedback offers 
better information routing paths, thereby making the gain of transmitting common messages effectively larger. This 
feedback gain is shown to be multiplicative, which is qualitatively similar to the gain in the two-user Gaussian 
interference channel Q. 

The rest of this paper is organized as follows. In Section |IIJ we review the LIC problems developed in the 
context of certain single-hop multi-terminal networks ||T1, ||2l, ||3l. The results in Section m lead to a new type of 
deterministic model, which is presented in Section JII] In Section |IVj we apply the framework to the interference 
channel, constructing a corresponding deterministic model. In Section |Vl we extend this deterministic model to 
multi-hop layered networks, thus developing the best transmission strategy that maximizes transmission efficiency. In 
Section |Vll we explore the role of feedback for multi-hop layered networks and conclude the paper with discussions 
in Sections IVIII and rVIIIl 


II. Linear Information Coupling Problems 

This section is dedicated to a brief review of the linear information coupling (LIC) problems which are formulated 
based on the local geometric approach in 111, lH, 121. Here we will summarize the local geometric approach and 
its application to point-to-point channels, broadcast channels, and multiple-access channels. 

In general, the LIC problems are represented in the multi-letter form. However, Huang-Zheng 121 took the 
following two steps to translate them into much simpler problems: (i) translating information theory problems to 
linear-algebra problems, and (ii) single-letterization. In this paper, we will focus on the first step, while referring 
readers to 121 for details on the single-letterization stepQ- 

A. The Local Approximation of the Kullback-Leibler Divergence 

The key idea of the local geometric approach lies on an approximation of the Kullback-Leibler (KL) divergence 
m. Let P and Q be two probability distributions over the same alphabet Af. We assume that Q and P are close to 

*In general, the single-letter version is not equivalent to the corresponding multi-letter one for arbitrary networks, e.g., general A-user 
BCs. However, it is shown in that there always exist optimal finite-letter solutions. Note that our approach in this paper for solving the 
single-letter problems can be easily extended to their finite-letter versions, so we will consider only the single-letter problems. 
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each other, i.e., Q{x) = P{x) + e- J{x), for some small quantity e. Then, using the second order Taylor expansion, 
the KL divergence can he written as 






Pjx) 

Q{x) 


— ^ P{x) In 



1^2 ^ 

2 ' ^ P(x) 

X ^ ' 


+ o{^) 





( 1 ) 


where L = [y/P ^]J, and [y/P is the diagonal matrix with entries {^/P{x) x € X}. Note that replacing 
[a/p ] with [^/Q~^] in the above Euclidean norm results in only the difference of order o(e^). Hence, D{P\\Q) and 
D{Q\\P) are considered to he equal up to the first order approximation. From this approximation, the divergence 
can he viewed as the (weighted) squared Euclidean norm between two distributions. In the rest of this section, 
we demonstrate how this local approximation technique can be used to translate information theory problems into 
linear algebra problems. 


B. Point-to-point Channels 

In this section, we will first review the formulation of EIC problem in a simple context of point-to-point channels, 
and then explain how the local geometric approach serves to translate it into a simple linear-algebra problem. 
Consider a point-to-point channel with input X £ X, output Y £ y, and the channel matrix W associated with 
the channel transition probability Py\x- Given some input distribution Px, the EIC problem is formulated as: 

max I{U;Y), (2) 

U^X^Y-.I{U-,X)<\e^ 

where e is assumed to be small. The EIC problem aims at exploring the optimal transmission strategy of each node 
that wishes to send a small amount of message to certain destination(s) in networks. In the point-to-point setting, 
the following interpretation makes a connection between the above problem and the goal. Eet us view (7 as a 
message that the transmitter wants to send. One can then interpret I{U;X) as the transmission rate of information 
modulated in X, and I{U;Y) as the data rate of information that is transferred to the receiver. Unlike classical 
communication problems, the EIC problem targets a setting in which the amount of information is small. This 
is captured by the above assumption that e is sufficiently small. In addition, it is assumed thajl for all u and X, 
Px\u=u{x) — Px{x) = o(e). See |[T1, 131 for more detailed discussions and justifications of this formulation. 

The goal of (|2l is to design Px\u=u for different u, such that the marginal distribution is fixed as Px, and (|2l 
is optimized. To solve this problem, first observe that we can write the constraint as 

I{U;X) = ^Pu{u) • D{Pxiui-\u)\\Px) < (3) 

U 

Thus, if we write Px\u=u ^ local perturbation from Px, i-e., Px\u=u = Px + e • Ju, and employ the notation 
Lu = [s/Px • Ju, then we can simplify the constraint (l3]l by the local approximation ([Hi as 

Pu{u) ■ \\Lu\\^ < 1- 

U 

Moreover, note that U ^ X ^ Y forms a Markov relation, we have 

Py\u=u = WPx\u=u = WPx + e-WJu = PY + e- W[^/^]Lu, 

^The assumption of small I{U;X) does not necessarily imply Px\u^u’^ are close to Px- See 0, (6). However, the extra assumption 
that Px|t/=u’s are close to Px leads to a geometric structure in the distribution spaces, which allow us to solve general network information 
theory problems in a systematic way. See 0 for details. In the rest of this paper, we will employ this extra assumption and develop the 
geometric structure for general networks. 
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where the channel applied to the input distrihution is simply viewed as the channel transition matrix W, of dimension 
13^I X \X\, multiplying the input distrihution as a vector. 

Then, using the local approximation ([T]), the linear information coupling problem ([21) becomes a linear algebra 
problem: 


max Pu (u) • 

1 1 

1 

1 _ 1 

1 1 

1 _ 1 

to 

(4) 

subject to: Y^Pu{u) • 

U 

\\Luf < 1, ^ = 0. 

X 

(5) 


where the second constraint of (l5]l comes from 

^JPx{x)Lu{x) = ^ ^(Tx|c/=«(a:) - Px{x)) = 0. 

X X 

Here, we denote B = [s/Py ^\W[y/ Px] and call it the divergence transition matrix (DTM). Note that in both 
dU) and (lU) the same set of weights Pjj{u) are used, thus the problem can be reduced to finding a direction of 
L*, which maximizes the ratio ||HL*||/||L* ||, and the optimal choice of should be along the direction of this 
L* for every u. By linearity of the problem, scaling L„ along this direction has no effect on the result. Thus, we 
can without loss of optimality choose [/ as a uniformly distributed binary random variable, and further reduce the 
problem to: 

max IlHL,, |P, (6) 

where x^Px represents a |Tf|-dimensional vector with entries 

In order to solve ®, we shall find as the right singular vector of B with the largest singular value. However, 
the largest singular value of i? is 1 with the right and left singular vectors s/Px and y/Py, and choosing as 
y/Px violates the constraint L^Py/Px- On the other hand, the rest right singular vectors of B are orthogonal to 
yfPx, satisfying the constraint LuPy/Px- Therefore, the optimal solution L* must be the right-singular vector with 
the second largest singular value, and the corresponding maximum information rate is 

max||5L„||^ = crs^ax(^) =: 


Here (Tsmax(^) denotes the second largest singular value of B, which we define as a. This shows fhat the problem 
is reduced to a simple linear-algebra problem of finding the fundamental direction L* that maximizes the amount 
of information I{U;Y) that flows into the receiver. 

Example 1: Consider a quaternary-input binary-output point-to-point channel: 

Y _ f ^ © Yi, X G {0,1}; 

\ (X mod 2)0^2, X€{2,3}, 


where Zi ~ Bern ( 2 ) and Z 2 ~ Bern (a). The probability transition matrix is then computed as 


W = 



1 — a a 
a 1 — a 


Suppose that Px is fixed as [i, We can fhen compute Py = WPx = [|, and B = ^W. A simple 

computation gives: 

LI = ^[0,0,1, -If, = \\BLl\X = i(l - 2a)2. 

This solution is intuitive. Note that when X G {0,1}, it passes through a zero-capacity channel with Z\ ~ Bern(i). 
On the other hand, when X G {2,3}, the channel is a binary symmetric channel with a. Therefore, information can 
be transferred only when X G {2, 3}, which matches the solution of L* as above. Note that L* contains non-zero 
elements only for the third and fourth entries corresponding to X = 2 and X = 3 respectively. When a « f the 
channel w.r.t X G {2,3} is very noisy. As a is far away from f however, the channel is less noisy, thus delivering 
more information. This is reflected in the form of as above. □ 
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C. Broadcast Channels 


Now, let us consider the LIC problem of broadcast channels. Suppose that a two-receiver discrete memoryless 
broadcast channel with input X £ X and two outputs (Yi,l 2 ) G 3^i x 3 ^ 2 ^ is specified by the memoryless channel 
matrices Wi and W 2 - These channel matrices specify the conditional distributions of the output signals at two 
receivers as Wk{yk\x) = PYi.\x{yk\x) for k = 1,2. Let Uq be a common message intended for both receivers; 
and f7i, U 2 be private messages intended for receivers 1 and 2 respectively. Assume that {Uq,Ui,U 2 ) are mutually 
independent and Px is fixed. Let (i?i, i? 2 , i?o) be the corresponding information rates. 

For this setting, the LIC problem is formulated as the one that maximizes a rate region TZbc such that 

Ri<I{Ur,Yi), R2<I{U2-,Y2), 
i?o <min{/(C/o;yi),/(C/o;y 2 )}, 

under the locality assumption of 

I{U2-,X)<]^el 

I{Ro',Y) < -Cq, + 62 + eg = e^. 

Here [Uq, Ui, U 2 ) —)• X —)• (Fi, Y 2 ) forms a Markov relation and e is assumed to be some small quantity. 

While a natural extension of the point-to-point-channel locality assumption is I{Ui,U 2 ,Uo; X) < it can 
be shown that [31 the resultant rate region TZbc with this assumption is the same as considering the above three 
separate assumptions instead. Note that I{Ui,U 2 ,Uq]X) < captures the tradeoff between (Ri, R 2 , Ro) in an 
aggregated manner, thus making the optimization involved. On the other hand, under the separate assumptions, the 
tradeoff is captured only by ef + el + ^0 =: <C 1: given that is appropriately allocated to (ef, e|, eg), there is 
no tension between those rates. Hence, this simplification enables us to reduce the problem to three independent 
sub-problems: two are w.r.t. private messages (f7i, U 2 ), and the last is w.r.t. the common message C/q. 

The optimization problems w.r.t. the private messages are the same as in the point-to-point channel case: for 
k = l,2, 

max/(f7fc; Yfc) = • al + o{e^), 

where ak = CTsmaxiBk), and = [\/PYk ^]WkW Px]- Thus, the main focus here is the optimization of the common 
information rate. Suppose that Px\Uo=uo = Px + e • Juq, and = [s/Px ^]Juo- Using similar arguments, we can 
then reduce the problem to: 


max min{||HiL„J|^, ||H 2 L„JP} . 


( 8 ) 


Now, this problem is simply a finite dimensional convex optimization problem, which can be easily solved. Let cJq 
be the maximum value w.r.t. the L*^. 

Example 2: Consider a quaternary-input binary-outputs BC: for k £ {1,2}, 


( X (B Zki, X £ {0,1}; 

{ [X mod 2) 0 Zk 2 , X £ {2,3}, 


where Zii,Z 22 ~ Bern( 2 ) and ^ 12,^21 ~ Bern(a). The transition probability matrices are computed as 


lUi 


W 2 


i i 1 — cc a 
II a 1-a 
1 — cc a I 5 

a 1 — a 2 2 
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Suppose that Px is fixed as ■ We can then get Py^ = Py^ = This allows us to compute 

Bk = {k = 1,2). With a simple linear-algehra calculation, we obtain 

= ^[0,0,1,-Ip, al = ]^{l-2af- 
K. = ^[1, -1,0, op, ai = ^(1 - 2af- 
T:„ = ^[1,-1,-1,1]^, al = ^{l-2af. 

Here, one can see the difficulty of delivering common message, as compared to private message transmission. Note 
that (Tq is half of the (Tf(= ul). This example represents an extreme case where Uq is minimized for all possible 
channels having the same cxi and a 2 , and thus the gap between cjo and ai{= (T 2 ) is maximized. Note that Uq 
has a trivial lower bound. It must be greater than a naive transmission rate: min{A(Tf, (1 — A)cr 2 }, which can be 
achieved by privately sending a message first to receiver 1 with the fraction A of time and later to receiver 2 with 
the remaining fraction (1 — A) of time. This naive rate can be maximized as: 

2 2 

max mini Act?, (1 — A)(J 2 | = ^ ^ ^ t (9) 

0<A<1 L i,v / ^2 ^2 

In this example, this rate is maximized as which coincides with cjq. □ 

D. Multiple-access Channels 

Now, let us consider the LIC problem of multiple-access channels. Suppose that the multiple-access channel 
has two inputs Xi G Ali, X 2 € AI 2 , and one output Y € y. The memory less channel is specified by the channel 
matrix W, where W{y\x\,X 2 ) = Py|Xi,X 2 (y|^i, ^ 2 ) is the conditional distribution of the output signals. We want 
to communicate three messages (C/i, U 2 , Uq) to the receiver with rates (i2i, i? 2 , Ro), where 17i and U 2 are privately 
known by transmitter 1 and 2 respectively, and Uq is the common source known to both transmitters. Then, the 
LIC problem for the MAC is formulated as the one that maximizes a rate region T^-mac^ 

Ro < I{Uo;Y), < I{Ur,Y), R 2 < I{U 2 ;Y), (10) 

such that Uq —)• (Xi, ^ 2 ) Y, Ui ^ Xi ^ Y, C /2 —>• X 2 —>• Y, and the local constraints are: 

I{Ur,X,)<^el I{U2-,X2)<\el 

liUo] Xi, X 2 ) < -€q, el + e2 + eQ = e^. 

Again, e is assumed to be some small quantity. 

Define the DTMs Bk = [s/^ ^]Wk[\/Px^], for A: = 1,2, where 

Wkiy\xk) = ^ W{y\xi,X2)Px:i-Ax3-k)- 

Two optimization problems w.r.t. private messages are the same as in the point-to-point channel case: maxI{Uk',Y) = 

hl^l + = CTsmaxiBk)- 

Now suppose that 

PXi\Uo=Uo = RXi + eo • Ji,Uo- 

Since Xi and X 2 are conditionally independent given Uq, we can write PxiX 2 \Uo=uo 

Rxi,X 2 \Uo=uo = Rxi\Uo=uo ® Rx 2 \Uo=uo = Rxi ® Px 2 + €0 • Jl,Uo ® Px 2 + €0 ' -PXi C) J 2 ,Uo + 0{e‘^). 


Then, the condition I{Uq-, Xi, X 2 ) < can be written as 

• ||T„J|^ < 1, 
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where = 


[V^" V?;«„ vino ^ . Moreover, we can write Py\Uo=uo 

Py\Uo=uo = ^ • PXi,X2\Uo=uo = Py + + eo^2-^2,uo + O(e^) 

so I{Uo;Y) can be written as 


— 1 , 


iT 


y~l PUai^o) ■ \\BoP 


2 
II 5 


Uo 


where Bq = [Bi i? 2 ]- Therefore, the optimization problem w.r.t. the common message can be reduced to 

|2 


max 


iBriL, 


QY/uq I 


( 11 ) 


Observe that unlike the point-to-point channel case, the Lug has to respect the constraint that the first \Yi\ entries of 
Luo (an I Afi I-dimensional vector) is orthogonal to Pxi, and the last |T’ 2 | entries of Lug is orthogonal to ^/P^. 
Nevertheless it is shown in IJl that the optimal L^g in (fTTI) is still the right singular vector of Bq with the second 
largest singular value. Hence, the maximum of (fTTI) is ^egCJo where = cr 5 ^ 3 ^([Hi B 2 ]). 

Example 3: Consider a quaternary-inputs binary-output MAC with 


P(0|xiX2) 


P( 1 |X 1 X 2 ) 


i (2 - a), XIX 2 = ( 00 , 01 , 02 , 10 , 11 , 12 ); 

a, xia ;2 = (03,13, 23, 33); 

|(4-5a), xia ;2 = (20,21,22,32); 

3(-2-|-7a), xia;2 = (30, 31), 

1 - P(0|X1X2), V(X1,X2). 


Here we assume that | < a < |, which allows us to have a valid probability distribution. Suppose that both Pxg 
and Pxg are fixed as [ 2 , 5 , 5 , 5 ]^. The probability transition matrices are then given by 


Wi = W 2 



1 — a a 
a 1 — a 


We can then compute Bi = B 2 = Hence, we get the same (L*^,cr|) as that in Example |2] for k = 1,2. 

For (L*^,( 7 g), we obtain 

Kg = ^[0, 0, 1 , -Ijo, 0, 1 , -1]^, = (1 - 2a)2. 

Here we can see a gain due to coherent combining of the transmitted signals. Notice that the common rate cjq 
is double the private rate cr^ = < 72 . One can interpret this as a so-called beamforming gain that is widely used to 
indicate the coherent combining gain in the context of multi-antenna Gaussian channels. □ 


HI. A New Deterministic Model 

The local geometric framework in Section HIl provides a systematic approach in exploring the LIC problems. It 
turns out that this approach allows us to abstract arbitrary communication networks with a few key parameters 
induced by the networks, thus developing a novel deterministic model. In this section, we construct deterministic 
models for the point-to-point, broadcast and multiple-access channels discussed in the preceding section, and will 
extend to more general communication networks in the following sections. 

Prior to describing our model, we emphasize three distinguishing features of the model with a comparison to 
one popular deterministic model: the Avestimehr-Diggavi-Tse (ADT) model Q. 

• Target channels: While the ADT model is intended for capturing key properties of wireless Gaussian channels, 
our model aims at arbitrary discrete-memoryless channels. 

• Approximation: In the ADT model, approximation to Gaussian channels is accurate when links have high 
signal-to-noise ratios. On the other hand, our model relies upon the Euclidean approximation and hence it is 
accurate as long as the channels are assumed to be very noisy, i.e., Px\u=u being close to Px- The locality 
assumption puts limitations to our model in approximating general not-very-noisy channels. 

• Signal interactions in the noise-limited regime: The ADT model focuses on the interaction of transmitted 
signals rather than on background noises, thus well representing the interference-limited regime, where the 











noise power is negligible compared to signal powers. Our model, however, can well represent noise-limited 
regimes in which a beamforming gain often occurs. Moreover, even for very noisy channels, signal interactions 
can be captured in our model. This is a significant distinction with respect to the ADT model targeted for 
Gaussian channels. Note that for very noisy Gaussian channels, signal interactions are completely ignored as 
the channels are considered as multiple point-to-point links in the noise-limited regime. 

Remark 1: While our model does not well approximate not-very-noisy channels which often represent many 
realistic communication scenarios, it still plays a role in some realistic networks. One such example is a cognitive 
radio network in which secondary users wish to exchange their information while minimizing interference to the 
existing communication network for primary users. By modeling the encoding of the secondary users’ signals as 
superposition coding to existing primary signals, we can formulate an LIC problem that intends to characterize the 
tradeoff between the communication rates of the secondary users and the interference to the existing communication 
network. In Section IVII-Cl we will provide more detailed discussions on this, and also show the potential of our 
model to a wide range of other interesting applications beyond communications. 

Notations: For illustrative purpose, we shall use the following notations for the rest of this paper. Let 6 and 
6 k be and respectively. In fact, we assume that <5 is a small value, as it allows us to exploit the local 
approximation to derive capacity regions. However, once the capacity regions are obtained, the 6 acts only as a 
scaling factor. So for simplicity, we normalize the regions by replacing 6 with 1. In addition, in order to distinguish 
the local-approximation-based capacity region from the traditional one, we shall call it the linear information 
coupling (LIC) capacity region. With slight abuse of notations, we will use the notation C (usually employed to 
indicate the conventional capacity region) to denote the LIC capacity region. We will also use the notation Csum 
to indicate the LIC sum capacity. 

A. Point-to-point Channels 

For a point-to-point channel, from Section II-A, the LIC capacity is simply I{U;X) « 6 ■ This naturally leads 
us to model the point-to-point channel as a single bit-pipe with capacity Here the quantity can be computed 
simply as the second largest singular value of the DTM. Importantly, note that this deterministic model provides a 
general framework as it can abstract every discrete-memory less point-to-point channel with a single quantity 


B. Broadcast Channels 

For a general broadcast channel, the LIC capacity region (I7]l is derived as 

Cbc = {{Ri, R 2 , Rq) : Rk < 6kcrl,k G [0 : 2]] , 

Si-\-S2-\-SqK1 

where Ok ’s can be computed as in Section III-CI This simple formula of the region leads us to model a broadcast 
channel as three bit-pipes, each having capacity Skcrf. Unlike traditional wired networks, the capacities of these 
bit-pipes are flexible: Skcr'l can vary depending on different allocations of ((5i, 62 , (5o) subject to (5i -|- (52 -|- (5o < 1. 
Hence, the LIC capacity region is of the shape as shown in the right figure of Fig. [T] 

The left figure in Fig. [T] shows a pictorial represenfafion of our deferminisfic model for discrefe-memoryless 
broadcasf channels. Here physical-Rx k wishes fo decode ifs privafe message Uk as well as fhe common message 
Uq. So we can represenf physical-Rx k by fwo virtual receivers, say Rx k and Rx 0, which infend fo decode Uk 
and Uq respectively. Employing fhe virfual receivers, we now model fhe broadcasf channel wifh one fransmiffer and 
fhree receivers in which each receiver decodes ifs individual message. Here fhe circles indicafe bif-pipes infended for 
fransmission of differenf messages. For insfance, fhe fop circle indicafes a bif-pipe w.r.f. fhe C/i-message fransmission. 
Nofe fhaf differenf fypes of messages are delivered via parallel channels, idenfified by circles. 

Anofher significanf distinction w.r.f. fhe fradifional wired nefwork model is fhaf channel paramefers (a^, (T^) o^q) 
have fo respecf fhe inequalify fhaf infrinsically comes from fhe sfrucfure of fhe broadcasf channel: 


2 2 

^1^2 ^ 2 ^ ■ r 2 2i 

2 , 2 ^ ^0 < minimi, £72}. 

^2 


( 12 ) 


Nofice fhaf fhe lower bound can be achieved as shown in Example |2] This equalify corresponds fo fhe case, where 
fhe fwo optimal perfurbafion vectors for each of fhe fwo users are somehow orfhogonal, and if is difficulf fo find 








Ui 


Uo 


U 2 
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Ri 


CbC — [J {{Rl, R 2 , Ro) '■ Rl < SiO'i,R 2 < 52(72,-Ro < 5oCrg} 

(5i+(52+5o<1 


Fig. 1. The bit-pipe deterministic model for discrete-memoryless broadcast channels. The LIC capacity region leads us to abstract a BC 
as a deterministic channel with three bit-pipes, each having the capacity of Skc^k- Note that the capacity Sko-^ can change depending on an 
allocation of (iJi, 82 , 5o)- Here we normalize the capacity region by 5. Rx k indicates a virtual terminal that decodes only [/*,, for A: = 0,1, 2. 
Hence, physical-Rx k consists of virtual-Rx k and virtual-Rx 0, for k = 1,2. 


a communication scheme that conveys much information to both receivers simultaneously. On the other hand, the 
equality of the upper hound holds when the two optimal communication directions of two users are aligned with 
each other, so that one can design a perturbation vector that broadcasts information to both receivers efficiently. 
Moreover, the upper bound implies that common-message transmission requires more communication resources 
than private-message transmission does. Following the procedure in Section III-Cl one can explicitly computing 
CTfc’s, thus quantifying the cost difference between common-message and private-message transmissions. 

In addition, in this deterministic model, the trade-off between {Ri,R 2 , Rq) can be well adjusted with (5i, ^ 2 ) 5o) 
subject to 5i + (52 + 5o < 1. This trade-off can be precisely evaluated from /r-sum-rate maximization, which can be 
carried out via a simple LP problem formulation as follows: 

2 2 

max E Ffc ■ i^k^k) • ^ ^ E 1- 

k=0 k=0 

In the case of the sum-rate maximization where = 1,VA:, we can get Csum = cj|, cJq} = max{(Tf, (t|}. 

Here we have used (fT^ . This solution implies that common-message transmission is more expensive, and hence 
choosing a more capable link among private-message bit-pipes yields the maximum sum rate. 

C. Multiple-Access Channels 

The LIC capacity region (fTOl) for the multiple-access channel is derived as 

Cmac = U {(-^1)-R2,-Ro) :-Rfc < 5^(71,^ G [0,2]} , 

where cjfc’s can be computed as in Section ITl-Dl Therefore, any discrete-memoryless MAC can be modeled as three 
bit-pipes, each having capacity Skcr^- See Fig. |2l Applying similar ideas as in the broadcast channel, we model 
physical-Tx k by two virtual transmitters, say Tx k and Tx 0, which wishes to send the private message Uk and 
the common message Uq respectively. So the multiple access channel is modeled with three transmitters and one 
receiver. 

Similarly, channel parameters ((T^, (tI, (Tq) here should also satisfy the inequality that comes intrinsically from 
the MAC structure: 

max{(T^,cj2} < ctq < erf (13) 

The lower bound of (fT3l) is straightforward. To see the upper bound, notice that for any valid perturbation vector 

L = [Lf 

\\BoLf < iWBiLiW + IIH 2 L 2 II)' < (aillLill +a2||L2||)2 < af + al 
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CmAC — {{Rl, R 2 , Ro) '■ Rl < SiO'i,R2 < S2(T2,Ro < '^o'^o} 

(51+^2+^ 0<1 


Fig. 2. The bit-pipe deterministic model for multiple-access channels. A discrete-memoryless MAC can be modeled as three bit-pipes 
where the capacity of each bit-pipe is SkUk- Unlike BCs, virtual transmitters are employed. Tx k indicates a virtual terminal that sends only 
Uk, for fc = 0, 1, 2. Hence, physical-Tx k consists of virtual-Tx k and virtual-Tx 0, for k = 1,2. 


Here the first inequality is due to the triangle inequality. The second inequality follows from the definition of 
cJi and a 2 : cjfc denotes the second largest singular value of Bk, k = 1,2. The third inequality comes from the 
Cauchy-Schwarz inequality and the unit-norm constraint: \\L\\‘^ = ||Tip + 11^2P < 1- Importantly, note that both 
transmitters share the knowledge of the common message, and hence they can cooperate each other in sending 
the common message efficiently. This is reflected in the upper hound of (fTSl) . being interpreted as the coherent 
combining gain (or the beamforming gain). 

Moreover, the trade-off between (i?i, i? 2 , Ro) can be evaluated from /r-sum-rate maximization. For example, the 
Lie sum capacity is given by Csum = icLax{cj^, crl) ^oi ~ ^O’ obtained via maximizing the coherent combining 
gain. 

Unlike the ADT model, our model can capture signal interactions even for non-negligible noisy channels. This 
is demonstrated through the following example. 

Example 4: Consider a binary-inputs binary-output MAC with 


P( 0 |xiX 2 ) 

P{1\XIX2) 


f 1 - a, X 1 X 2 = (00,11); 
\ a, X 1 X 2 = (01,10). 
1 - P(0|xiX2), V(xi,X2). 


In fact, this is a binary addition channel: 


y = Xi © X 2 © z, 


where Z ~ Bern(a). Suppose that both Px^ and ^c fixed as [i, The probability transition matrices are 
then given by 


■ 1 1 ■ 

Wi = W2= II . 

.2 2 . 

We can then compute Bi = B 2 = Wi, thus yielding af = = (Tq = 0. 

We now consider a different MAC where the above joint probability distribution is slightly changed as follows: 


P( 0 |xiX 2 ) 

P(llxiX2) 


f 1 — a, X 1 X 2 = (00,10); 

\ a, X 1 X 2 = ( 01 , 11 ). 

1 - P(0|a;iX2), V(xi,3:2). 
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The only difference here is that the prohahilities P{y\lO) and P{y\ll) are simply swapped each other. This simple 
change yields different values of (cjf, (Tq). Note that in this case, 


VFi 



W2 


1 — a a 
a 1 — a 


thus yielding cjI, Cq) = (0, Therefore, we can see that even for non-negligihle noisy channels, 

signal interactions are well captured in our model. □ 

We now generalize this deterministic model to arbitrary discrete-memoryless networks. Specifically we will first 
construct a deterministic model for interference channels in Section |IVl and then extend to more general networks 
in the following sections. 


IV. Interference Channels 

The quantifications of the channel parameters in (fT^ and (fT3l) in Section |III] shed significant insights into exploring 
transmission efficiency in more general networks. Specifically (fT3] ) suggests that common-message transmission in 
the MAC is more advantageous due to the coherent combining gain. This motivates us to create common messages 
as much as possible. On the other hand, (fT^ suggests that it consumes more network resources to generate such 
common messages than the private-message generation. Hence, there is a fundamental trade-off between the cost of 
generating common messages and the benefit from transmitting common messages. With the framework established 
in the previous sections, we now intend to investigate the trade-off relation, thereby optimizing communication rates 
of networks. To this end, we will first explore interference channels in this section. 

For an interference channel with two transmitters and two receivers, there are 9 types of messages Pij where 
i,j = 0,1,2. Here Pij indicates a message from virtual-Tx i to virtual-Rx j, i,j G [0 : 2]. Note that Uio denote 
a common message (w.r.t. virtual-Tx i) intended for both receivers, while Uoj indicates a common message (w.r.t. 
virtual-Rx j) accessible by both transmitters. Then, the LIC problem for the interference channel is the one that 
maximizes a rate region such that 

Rij < Vf, j / 0, 

RiO < min{I{Uio-,Yi),I{Uio-,Y2)} Mi, 

subject to the constraints: 

I(Uij]Xi) < 6ij, i / 0, Vj, 

I{Uoj;Xi,X2) < 6oj, Mj, 

i,i=0,l,2 

Note that the constraints and the objective functions in the above are of the same mutual information forms as 
those in the BC and MAC problems in Section ini Therefore, following the same local geometric approach, (fT4b 
can be reduced to 

Rij < Sijafj, for i,j = 0,1,2, ^ <5^ < 1, (16) 

i,j=0,l,2 

where afj indicates a channel parameter that quantifies the ability of the channel in transmitting Uij, and can be 
computed in a similar manner as in Section ini 

2^1 maxv, min{||HjiVj|p, ||Hi2Vjlp}, i^o,j = 0; 

O-smaxQ-Blj B2j]), f = 0,y / 0, 

^ maxumin{||[Hii H2 i]u|P, ||[Hi2 H22]u|P} i = 0 ,y= 0 . 

Here, Bij indicates the DTM with respect to the channel matrix Wy-iXi between transmitter i and receiver j, and 
(vi,V 2 ,u) are unit-norm vectors, such that vi and the first \Xi\ entries of u are orthogonal to and V 2 


(14) 

(15) 
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Uu 

Uoi 

U21 


Uio 

Uoo 

U20 


U12 

U02 

U22 


Cic = [J { {Ri\,Rw, • • • , R22) ■ Rij < 


Fig. 3. A deterministic model for interference channels. We consider the most general setting with 9 messages, denoted by Uij’s, each 
indicting a message from virtual-Tx i to virtual-Rx j. This IC can be modeled as 9 bit-pipes, each having the capacity of Sijafj, where 5ij 
indicates the network resource assigned for transmitting Uij. 


and the last \X 2 \ entries of u are orthogonal to y/Px 2 - Consequently, the LIC capacity region of the interference 
channel is 


C\c = {{Rii,Rio,-' ■ ^^ 22 )'■ Rij < ^ijCTij} ■ (17) 

"Pij <1 

From (fTTI) . we can now construct a deterministic model, applying the same idea as in the previous section. This 
deterministic model consists of flexible 9 hit-pipes, where the capacity of each hit-pipe is Sijafj, and can vary 
depending on different allocations of 6ij’s. An illustration of the deterministic model is shown in Fig. [3l Note that 
the presented transmitters and receivers are virtual terminals, and the message Uij is transmitted from Tx i to Rx 
j. Moreover, fXjj’s should satisfy the inequalities similar to (fT^ and (fT3l) : 


cjfi + 

2 2 

^21 + ^22 


< ctIq < min{(Tii,cri2} 

< 0-20 < min{(T2i,cr22} 


2 2 
^ 01^02 

^01 + ^02 




max{cjfi,cj|i} < 

max{cj^2) ^22} — '^02 — '^12 + *^22 J 


(18) 


which can he derived similarly as in the BC and MAC cases. 

Example 5: Consider a quaternary-inputs hinary-outputs IC where F’(yi|xiX 2 ) is the same as that in Example^ 
hut P{y 2 \xiX 2 ) is different as 


P( 0 |xiX 2 ) 


P{1\XIX2) 


' 1(2-a), X 1 X 2 = (22,23,20,32,33,30); 

a, 3 : 1 X 2 = (21, 31,01,11); 

i(4-5Q;), X 1 X 2 = (02,03,00,10); 

|(-2 + 7a), X 1 X 2 = (12,13), 

1 - P(0|X1X2), V(X1,X2). 


To have valid prohahility distributions, similarly we assume that f < a < f. Suppose that both Px^ and Px 2 are 
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fixed as 


. The probability transition matrix Wij w.r.t Py^lXi is then computed as 


II 

1—1 

- 1 
! 

1 

2 

1 

1 — a 

a 

1 - 



. 2 

2 

a 

a 

1 

II 

■ 1 

f 

1 

? 

\ — a 

a 

1 - 


. 2 

2 

a 

a 

Wu = 

■ 1 

— a 

a 

1 — a 

1 

! 

1 - 
? 



a 

2 

2 . 

W22 = 

■ 1 

— a 

a 

1 — a 

1 

! 

1 ■ 
? 



a 

2 

2 . 


This gives Bij = ^Wij. Performing similar computations as those in Examples |2] and |3l we can get 


2 2 2 2 i - /I r , ^2 

<^11 — <^12 — <^21 — <^22 “ 2 ^ ~ 

•^10 = <^20 ~ 4 ’ 

*^01 ~ *^02 = (1 “ 2 q :) , 


= o (1 - 2a)' 


This example is an extreme case where sending Rx-common messages is the hardest as possible while sending Tx- 
common messages is the easiest due to the maximally-achieved beamforming gain. Note that 4crfo = 2iTfi = Uq^, 
thus implying that (a^Qjirlo) achieve the lower bounds in (fT^ . while (<7g^,(TQ2) achieve the upper bounds in (fT^ . 

□ 

In this deterministic model, the trade-off between the 9 message rate-tuples can be characterized by solving the 
LP problem for /i-sum-rate maximization. In particular, the LIC sum capacity can be obtained as 


a, 


max = maxcr,^,- 

max{croi,ao2}, 


where the last equality is due to ([T^ . Therefore, to optimize the total throughput, we will just let either Jqi or ^02 
be 1, and deactivate other links. In other words, the optimal strategy is to transmit a common message accessible 
by both transmitters, maximizing the beamforming gain. 


V. Multi-hop Layered Networks 

Deterministic models of single-hop networks such as BCs, MACs and ICs do not well capture the trade-off 
between the cost of generating common messages and the benefit from sending common messages. In BCs, only 
the cost due to common-message generation is quantified, while in MACs, we can only investigate the benefit from 
common-message transmission. In ICs, an obvious solution to sum-rate maximization is to maximize the coherent 
combining gain which comes from common-message transmission. 

On the other hand, in multi-hop layered networks, this tension can be well taken into consideration. Notice that 
a common message accessible by multiple transmitters in one layer must be generated from the previous layer. 
Hence, to optimize the throughput, one needs to compare the benefit from common-message transmission in one 
layer with the cost due to common-message generation in the preceding layer. Now one natural question that arises 
in this context is then: how do we plan which kinds of common messages should be generated in a given network 
to maximize the throughput? In this section, we will address this question. 

For illustrative purpose, we consider a general layered network with only two users in each layer, although 
our approach can be readily extended to more general cases at the expense of heavy notations. For the two-user 
L-layered network, the l-th layer is an interference channel with input symbols , and output symbols 

yf \ and the channel matrix Wy^y 2 \XiX 2 - ® 

For simplification, we assume a decode-and-forward operation [Si at each layer: part of messages are decoded at 
each layer and then these are forwarded to the next layer. With the decode-and-forward scheme, one can abstract 
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User 1^°^ User 1^^^ User User 1^^^ 



User 2(°) User 2 ^^'> User 2 ^^-'^'> User 2 ^^'> 


Fig. 4. The L-layered network with two users in each layer. The super index “(£)” denotes the ^-th layer of the transmitters, receivers, and 
the users. 



Fig. 5. A deterministic model for multi-hop interference networks. We introduce a separation principle across layers. We abstract each layer 
as the bit-pipe deterministic model, and then constitute an entire network with concatenating these layers. Layer I consists of 9 bit-pipes, 
each having the capacity of i,j €[0:2] and ^ € [1 : L]. Here represent the key parameters that characterize layer £’s 

channel. 


each layer as a deterministic model like the one for an IC, and a concatenation of these layers will construct a 
deterministic model of the multi-hop layered network. See Fig. [5] Here, we denote hy Si the virtual Tx i in the 
first layer, and hy di the virtual Rx i in the last layer. Denote hy a node that can act as the virtual Tx i and Rx 
i in the ^-th intermediate layer. In addition, the channel of layer i consists of 9 hit-pipes, each having the capacity 
of cr^j , for i,j = 0,1, 2, and i e [1 : L], and the corresponding constraint for 6ij’s is: 

iEEE4'’si. (19) 

l=\ i=0 j=0 

Here the constraint is normalized hy the number of layers. 

For simplicity, in this paper, we do not allow any mixing between distinct messages (network coding 0), 
focusing on the routing capacity. Then, for each set of that satisfies ([T^ . one can obtain a layered network 

with fixed capacity for each link (i,j) in the ^-th layer. This reduces to the traditional routing problem. 

Hence, we can characterize the LIC capacity region of the 9 rate tuples by investigating achievable rate regions 
over all possible sets of subject to ([T^ . 

Theorem 1: Consider a two-source two-destination multi-hop layered network illustrated in Fig. [5] Assume that 
9 messages C/j/s are mutually independent. Under the assumption of (fT^ . the LIC capacity region is 

Cln = U "t (-^11’ • • • ) R22) ■ Rij < dijafj} , 

where 

<Tij = Y max M{v\f). 


(20) 
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Fig. 6. The maximum rate for Uio when L = 2. In this example, we have three possible paths for sending Uio as shown in the figure. For 
each path, the maximum rate is computed as a harmonic mean of the link capacities along the path, normalized by the number of layers. 
Therefore, ctiq is given as above. 


Here, denotes a set of the link capacities along the g-th path from virtual source i to virtual destination j, and 

denotes the harmonic mean of the elements in the set Vif ■ 

Proof: Unlike single-hop networks, in multi-hop networks, each link can he used for multiple purposes, i.e., 

it) 

5lf can he the sum of the network resources consumed for the multiple-message transmission. For conceptual 
simplicity, we introduce message-oriented notations 5if^, each indicating the sum of the 5]^^ which contribute to 

delivering the message Uij. The constraint of X] ^ ^ leads to ^ Here the key observation is 

that the tradeoff between the 9-message rates is decided only by the constraint of given a fixed 

allocation of Jj/s, the 9 sub-problems are independent with each other. 

Now let us fix J^-’s subjecf fo fhe consfrainf, and consider fhe message Uij. Since fhere are 3^“^ possible pafhs 
for fransmission of fhis message, fhe problem is reduced fo finding fhe most efficient path that maximizes Rij, as 
well as finding a corresponding resource allocation for fhe links along fhe pafh. We illustrate the idea of solving this 
problem through an example in Fig. 0 Consider the delivery of Uiq. In the case of L = 2, we have three possible 
paths "P®), identified by blue, red and green pafhs. The key poinf here is fhaf fhe maximum rafe for 

each pafh is simply a harmonic mean of fhe link capacifies associated with the path, normalized by the number of 

layers. To see this, consider the top blue path Vlf consisting of two links with capacities of and cj^q , i.e., 
(1) 2^1) 2 f2) 

^ 10 ^ ~ {^11 ’^10 }■ Suppose that Sij is allocated such that the A fraction is assigned to the first link and the 

2 fl) 2 (2") 

remaining (1 — A) fraction is assigned to the second link. The rate is then computed as min{Acr^^^ ^ (1 “ /■ 

2 ,( 1 ) 2 ,( 2 ) 2 2 (‘2i') 

Note that this can be maximized as ^^ 2 . 12 ) = ^-^(^11 ^<Tlo )■ Therefore, the maximum rate is 

<^11 +^in 


^10 = 2 


{Mial’l 


2 .( 1 ) 2 ,( 2 ) 


a 


10 




2 .( 1 ) 

10 ’ 


cr, 


2 .( 2 ). 

00 . 


,M{a 


2 .( 1 ) 
12 : 


cr. 


2 .( 2 ) 

20 


We can easily show that for an arbitrary L-layer case, the maximum rate for each path is the normalized harmonic 
mean. This completes the proof. ■ 

Remark 2 (Viterbi Algorithm): Notice that the complexity for computing the LIC capacity region grows expo¬ 
nentially with the number of layers: 0(3^). However, the Viterbi algorithm ifTOl allows us to reduce the complexity 
significantly. Note that (l20l ) is equivalent to finding fhe path such that the inverse sum of is minimized. 

Taking l/crj^J^|^ as a cost, we can now apply the the Viterbi algorithm to find the path with minimal total cost, 
and hence the complexity is reduced to 0{L). □ 

In addition. Theorem [T] immediately provides the maximum throughput of this network as shown in the following 
Corollary. 

Corollary 1: Consider a layered network illustrated in Fig. [5J the LIC sum capacity under the constraint ([T9l) is 


Csum = , . max M 

1i,^2.....U + i£[0:2] 


2 .( 1 ) 

*2 • 


cr, 


2 .( 2 ) 

* 2*3 ’ 


,Cf, 


2.(T) ^ 

iL^LA-l ^ 


( 21 ) 




















where M(-) denotes the harmonic mean. 

Remark 3: Again one can find the optimal path via the Viterhi algorithm with complexity 0{L). □ 
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A. Multi-hop Networks with Identical Layers 

While Theorem [T] offers a way to find fhe opfimal sfrafegy for general layered nefworks, if is somefimes more 
useful fo undersfand fhe “patterns” or sfrucfures of fhe optimal communication schemes for large-scaled nefworks. 
For insfance, suppose fhaf channel paramefers are available only locally. Then fhe communicafion patterns can serve 
fo design local communicafion strafegies for users. In fhis section, we explore fhe communicafion patterns for a 
certain nefwork: fhe L-layered nefwork wifh idenfical channel parameters for each layer and L —)■ oo. Specifically, 
for all layers I, fhe channel paramefers are idenfical and denoted as = crfj. The following fheorem identifies 
fhe fundamenfal communicafion modes of fhe opfimal sfrafegies. 

Theorem 2 (Identical layers): Consider a layered nefwork illusfrafed in Fig. [5l where and L —)• 

oo. Then, fhe LIC sum capacify is 


Csum — (Tqq, ( 7I2, Af((7^Q, (Tq]^), M((T20) < 702)) Af((Tf2 5 *^21)5-^('^10) *^02 5 *^21))-^('^20) *^12)} ) 


where M(-) denotes fhe harmonic mean. 

Proof: Lef us firsf prove fhe converse parf. Firsf observe fhaf we use fhe routing-only scheme fo pass information 
fhrough fhe nefwork. Thus, for any opfimal communication scheme, we have fhe inflow equal fo oufflow for every 
node in fhe intermediate layers, i.e., for all k and i, 

= (23) 

i=0 j=0 

Moreover, for all £, fhe fofal fhroughpuf of fhe nefwork is Yll j=o^kj ^kj- Now, for a nefwork wifh L layers, lef 
us define a fuple of as a y-scheme, if 


2 


E 



2 

E 

i=0 


°ik ^ik 


= 7- 


Here we define C'i^m ,7 as fhe opfimal achievable fhroughpuf among all 7 -schemes. Since our goal is fo opfimize 
fhe nefwork fhroughpuf, if suffices fo only consider 7 -schemes fhaf satisfy (|2^ . Now, we wanf fo show fhaf if a 
7 -scheme satisfies (|2^ . fhen 7 is upper bounded by 2maxj j and nof increasing wifh L. To see fhis, note fhaf 


k=0j=0 


2 2 

EE 

k=0 i =0 


4k^^ik = < 2 max (T, 


(L) 






where fhe firsf inequalify is fhe friangle inequalify, and fhe second equalify comes from fhe facf fhaf fhe inflow is 
equal fo fhe oufflow for fhe schemes achieving fhe opfimal nefwork fhroughpuf (| 2 ^ : hence, ~ 

Ylk=o Si=o ^ik^^ik ~ C'ium^ 7 - Finally, fhe lasf inequalify is a frivial upper bound for fhe nefwork fhroughpuf. 

Now, fhe key technique fo find fhe opfimal fhroughpuf of fhe L-layered nefwork is fo reduce fhe L-layered 

(i) 

opfimizafion problem fo a single-layered one. This is illusfraled as follows: for any 7 -scheme 6)2 of a nefwork 
wifh L layers fhaf achieves C'iifm ,7 and satisfies (|2^ . we consider fhe fuple 6ij for i,j = 0,1,2, where 


6ij — 


-y6^^\ 


1=1 
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Then, we have 


E 

k=0 

-E 

L ^ 

k=0 

-E 

L ^ 

k=0 

-E 

L ^ 


E! E! ^ik^ik 

j=0 i=0 

2 L 2 L 


i=0 £=1 
L+1 2 


E E E ""fci "EE 4' fc 

i=0 £=1 

EE4^I.-EEC'’4 

£=1 i=0 £=2 i=0 


k=0 


j=0 i=0 


1 

L' 


Therefore, 5 ij is a (7/L)-scheme for a new network with only one layer, and this single layer is identical to each of 
the L layers of the original L-layered network. Moreover, from (| 2 ^ . for the 7-scheme of the original L-layered 
network, the inflow and outflow of all layers are the same. So, the total throughput of the (7/L)-scheme 6ij of the 
new single-layered network is 

EE4M. = ci, 




L 2 2 2 2 

}EEE 4 "-^ 


This implies that C'iifm,7 < Eum Thus, ^ is an upper hound for Csum,'y, and we only need to show that 


k=0j=0 

.( 1 ) 


-'kj ^kj 


^(L) 
-^sum,7 • 


e=i k =0 j=0 
^( 1 ) 


k=0j=0 


(L) 


sum, 


limi^oo j_ converges to the right hand side of ( 1221 ) . To this end, let us first show that ^ is continuous 
at T = 0. 

Lemma 1 : lim£^o+ C'ium.e = 

Proof: See Appendix lAl ■ 

Now, note that 7 is hounded hy the constant 2maxj ^ crT, independent of L, so ^ 0 in the limit of L. Hence, 

we have 


Csum < lim C'iui.i “ 

L^OO 'L 


( 24 ) 


where the limit exists due to the continuity at J = 0 . Therefore, an upper hound of Cjum can he found hy the 
following optimization problem: 


C'sum E rnax . 

s.t. EE ^ ^ 0 Vi,j 

hj 
2 

E^^ 


2 2 

ik^ik ~ 'y ^ ^kj<^kj^ ^ ^ [0 • ^]' 
i=0 j=0 


Note that the ohjective indicates the total amount of information that flows into the destinations. The three equality 
constraints in the above can be equivalently written as two equality constraints: 


<^01 = 
^12 = 



^02 

5o2- 


( 25 ) 


Note that all of the 5 ifs are non-negative, we take a careful look at the minus terms associated with 602- This leads 
us to consider two cases: (1) 602 = 0; (2) 602 / 0. 
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The first is an easy case. For 602 = 0, the prohlem can he simplified into: 

2 

rnax Y] dual + (2(5iocrio + 3(i2oO'2o + ‘^^ 21 (^ 21 ) ■ 

Oi-i ' ^ 

" i=0 

+ fl + ^ + 520 < 1, Sij > 0, Vi,j. 

V ^01 ^12 J 

This LP prohlem is straightforward. Due to the linearity, the optimal solution must he setting only one 6 ij as a 
non-trivial maximum value while making the other allocations zeros. Hence, we obtain: 


Csum < max{(Tii,CT00,O-22,M((T^0,O-0i),M(cri2,O-2l)>^('720>^01>^12)} ■ (26) 

Here, the fourth term M((T^q, Uq]^), for example, is obtained when 5io = and 6 ij = 0 for {i,j) / (1,0). 


The last term M{a 2 q, cr qi, 0 - 12 ) corresponds to the case when ^20 = /o-g^+g-^ /g-^ — 6 for [hj) / (2,0). 

We next consider the second case of 5 o 2 / 0. First note that since 5oi and (fi 2 are nonnegative, by (1^ . we get 


a. 


5o2 < ( 1 520 + ( 


CF, 


02 
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10 




02 


(T: 


5o2 < { 1 520 + ( 241 


cr, 


02 


a. 


21 


0 -, 


02 


<^10; 

521- 


The key point here is that in general LP problems, whenever 5 o 2 / 0, the optimal solution occurs when 5 o 2 is the 
largest as possible and the above two inequalities are balanced: 


(T: 


5o2 = ( ^ ) 520 + ( ^ ) 5: 


cr, 


02 


cr 


10 


cr. 


02 


nO) 


cr 


—1 5io — ( —^ 1 521. 


cr. 


02 


a. 


21 


cr, 


02 


Therefore, for 5 o 2 7 ^ 0, the problem can be simplified into: 


2 

max Y] 5ncrfj + (35iof7io + 252ocr2o) : 

2 = 0 

s.t. ^5.. + fl + 4 + + 4^20 

\ ^02 ^21) \ 

5 ij > 0 , Vz,j. 

This LP problem is also straightforward. Using the linearity, we can get: 

Csum < rnax {cr^;^, (Tqq, (722, M((720, CrQ 2 ), M(cr^Q, crQ 2 , (T 21 )} • (22) 


By (l26l ) and (ITT] ), we complete the converse proof. 

For the achievability, note that ( 7 ^ = so all 8 modes in (l22l) can be written in the form M ..., 

for k = 1,2,3, and are mutually different. Then, for A: = 1,2,3, n S [1 : k], and £ € [1 : L], the 

M , arli ^, ■ ■ ■, ) can be achieved by setting 






M{al 


2 . 

22 ’ ^ 2^3 ’ 


,cr: 


2A:^1 > 




( 28 ) 


and deactivating all other links by setting their 5jj’s to zero. Here, we assume that in (|2^ . when n = k, 5^^^^ 
denotes 5^^. It is easy to verify that the assignment of (|2^ satisfies the constraint (fT^ . thus we prove the 
achievability. ■ 
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Csum = max {cTii, CToo, cr^a, 

-^(<^ 10 ) '^Ol): -^('^ 20 ) ^ 02 )) -^('^ 12 ) *^ 21 ) 
-^^(^10) '^025 '^21)5 ^{^201 ^Oli '^12)} • 


Fig. 7. Lie sum capacity of multi-hop interference networks with identical layers. 



(a) (b) 



(c) 

Fig. 8. The rolling of different pieces of information between users layer by layer for the optimal communication scheme that achieves (a) 
(Til (b) M((Tio,(Toi) (c) M((Tio, (To 2, rrli). 


Theorem |2] implies that the optimal communication scheme is from one of the eight communication modes 
in (I 22 I) . Fig. |7] illustrates the communication schemes that achieves modes cjqq, M{a‘l 2 , (^ 21 )’ '^ 02 ; ^ 2 i)> 

where other modes can he achieved similarly. For example, the mode M , crg 2 , ) is achieved hy using links 

1 — 0, 0 — 2, and 2 — 1, such that 


r 2 r 2 X 2 
OlO<Tio = 002<7 'o 2 = 021<T2l = “ 


2 

10 ’ 


'^ 02 ’ *^211 


and other 6ij = 0. Then, the information flow for each layer and the sum rate are all M(crfQ, crQ 2 , cr^i). 

More interestingly, in order to achieve (l2^ . it requires the cooperation between users, and rolling the knowledges 
of different part of messages between users layer by layer. We demonstrate this by considering the communication 
scheme that achieves M(cj^q, 0 -^ 2 , crii) example. Suppose that at the first layer, the node s* has the knowledge 
of message Ui, for i = 0,1,2. Since sg is the virtual node that represents the common message of both users. 
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user 1 knows messages {Uq, Ui), and user 2 knows {Uo, U 2 )- Then, to achieve M(cjfQ, iTq 2 , fxli), user 1 broadcasts 
its private message Ui to both users in the next layer, and both users in the first layer cooperate to transmit their 
common message to user 2 in the next layer as the private message. Thus, in the second layer user 1 decodes 
messages {Ui,U 2 ) and user 2 decodes {Ui,Uq). Similarly, in the third layer, user 1 decodes {U 2 ,Uq) and user 
2 decodes {U 2 , Ui), and then loop back. This effect is shown by Fig. |8(c)[ Therefore, according to the values of 
channel parameters. Theorem [2] demonstrates the optimal communication mode, and hence indicates what kind of 
common messages should be generated to achieve the optimal sum rate. 

VI. Feedback 

We next explore the role of feedback under our local geometric approach. As in the previous section, we employ 
the decode-and-forward scheme for both forward and feedback transmissions, under which decoded messages at 
each node (instead of analog received signals) are fed back to the nodes in preceding layers. In this model, one 
can view the feedback as bit-pipe links added on top of a deterministic channel. With this assumption on the 
feedback, we can see that in the deterministic model of the BC, as received signals are functions of transmitted 
signals, so is feedback. Therefore, feedback does not increase the LIC capacity region. The deterministic MAC can 
be interpreted as three parallel point-to-point channels, where feedback is shown to be useless in increasing the 
traditional capacity ifTTll . Hence, the LIC capacity region does not increase with feedback either. In contrast, we 
will show that feedback can indeed increase the LIC capacity region for a variety of scenarios in multi-hop layered 
networks. Let us start with interference channels. 


A. Interference Channels 

Theorem 3: Consider the deterministic model of interference channels illustrated in Fig. |3 Assume that decoded 
messages at each receiver are fed back to all the transmitters. Let 6ij be the network resource consumed for 
delivering the message Uij, and assume ^ dij < 1. The feedback LIC capacity region is then 


Cic = U • • • ! R22) ■ Rko < / 0, 

(i,j) ^ (1,0),(2,0)} , 

where 


2,fb f 2 

^(^12 >^01) 

M(o-fi, 


10 ~ 1 *^10) 

2 

’ 2 


2,fb / 2 

M{(J2i,crh) 



26 -max <^(720, 

2 

’ 2 



(29) 


Proof: Fix 5ij’& subject to the constraint. First, consider the transmission of Uij when (i,j) / (1,0), (2,0). In 
this case, the maximum rate can be achieved by using the Tx i-to-Rx j link. Hence, Rij < Sijafy 

On the other hand, in sending C/ 10 , we may have better alternative paths. One alternative way is to take a route 

__ f00db3ck 

as shown in Fig. 19J Tx 1 —)■ Rx 2 —> Tx 0 —virtual-Rx 1. The message is clearly a common message intended 
for both receivers, as it is delivered to both virtual-Rxs. Suppose that the network resource ciio is allocated such that 
the A fraction is assigned to the remaining (1 — A) fraction is assigned to the cxQ^-capacity 

link. The rate is then mm{Xaf 2 , (1 — which can be maximized as 1 M(cj^2 ) other alternative path 

is: virtual-Tx 1 —)• virtual-Rx 1 virtual-Tx 0 —)• virtual-Rx 2. With this route, we can achieve lM(cr^^, (Tq 2 ). 

Therefore, we can obtain as claimed. Similarly we can get the claimed . ■ 

Remark 4 (Role of feedback): In the traditional communication setting, it is well known that feedback can 
increase the capacity region of MACs and degraded BCs |[T2l . |[T3l . but the capacity improvement is marginal, 
providing at most a constant number of bits in the Gaussian channel. On the other hand, feedback can provide a 
more significant gain in ICs: in the Gaussian channel, it provides an arbitrarily large gain as signal-to-noise ratios 
of the links increase lU. In the LIC problem setting, the impact of feedback is similar yet slightly different. The 
difference is that for MACs and BCs, feedback has no bearing on the LIC capacity regions. However, as can be seen 
from Theorem |3l feedback can strictly increase the LIC capacity region in the interference channels. Also the nature 
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Fig. 9. An alternative way to deliver the common message of t/io. One alternative way is to take a route: virtual-Tx 1 ^ virtual-Rx 
2 —>■ virtual-Tx 0 virtual-Rx 1. The message is clearly a common message intended for both receivers, as it is delivered to both 
virtual-Rxs. We can optimize the allocation to the two links to obtain the rate of 




C|C — ^?10i • • • ) R22) '■ Rw < R20 < <^200'2o*’, Rij < 


Fig. 10. Interference channels with feedback. A feedback IC can be interpreted as a nonfeedback IC where (o-jq, a^o) are replaced by the 

/ 2,fb 2,fb\ • XttTI 

(<^16 > ^^20 ) in 


of the feedback gain is similar to that in lH, llT4l : relaying gain. From Fig|3 one can see that feedback provides 
an alternative better path, thus making the beamforming gain effectively larger compared to the nonfeedback case. 
Also the feedback gain can be multiplicative, which is qualitatively similar to the gain in the two-user Gaussian 
interference channels lH. Here is a concrete example in which feedback provides a multiplicative gain in the LIC 
capacity region. □ 

Example 6: Consider the same interference channel as in Example [5] but which includes feedback links from all 
receivers to all transmitters. We obtain the same fjjj’s except the following two: 

^10 =^^20 = 3 ^ (1 - 2a) = fJio = ^20- 

Note that = | when a / implying a 33% gain w.r.t Riq. □ 

Remark 5!° With Theorem [3j one can simply model an interference channel with feedback as a nonfeedback 
interference channel, in which channel parameters (afg, (7 |q) are replaced by the in (|2^ . See Fig. [TOl 
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B. Multi-hop Layered Networks 

For multi-hop layered networks, we investigate two feedback models: (1) full-feedback model, where the decoded 
messages at each node is fed back to the nodes in all the preceding layers; (2) layered-feedback model, where the 
feedback is available only to the nodes in the immediately preceding layer. 

Theorem 4: Consider a multi-hop layered network illustrated in Fig. |5l Assume that satisfy the constraint 
of (fT^ . Then, the feedback LIC capacity region of the full-feedback model is the same as that of the layered- 
feedback model, and is given by 

^LN = U {(-^11) -^10) • • • ) R 22 ) '■ Rij < 1 (30) 


where 


afj = — max 


Here, the elements of the set are with respect to a translated network where are replaced by 

(cj^Qcj 2 o^^for each layer ^ e [1 : L]: 


2,m 2,{i), 


a 


2,(£),fb 

10 


= max < a 


2 ,(£) 2 ,(£)s 2 ,(r) 2 ,(r)x 

2,m ) ^{^ 11^02 ) 


10 ! 


CT. 


2 ,(r),fb _ 


20 


= max < a. 


2 2 

2,(£) 2,{l)s 2,{£) 2,{£). 

2 ,(£) 3T(c72i' ', 0 - 02 ' ') M(cJ22'sV ) 

20 J o ’ O 


( 31 ) 


Proof: First, let us prove the equivalence between the full-feedback and layered-feedback models. We introduce 
some notations. Let Xft] be the transmitted signal of virtual source s* at time f; let xf\f\ be the transmitted signal 


of node at time t\ and let A"(^1 [f] = X^\f\, x'^'^ \t], , where £ G [1 : L —1]. Define A* ^ = {A[j]}( 


d^)r 


4£) 


Wr 


Let Lj [t] be the received signal of node r| at time t, and let [f] = [t ], Yq [t ], 1^2 [^] > where i £ [I : L], 

f 

Let Ui = [Uii, Uio, Ui 2 ]. We use the notation A = B to indicate that A is a function B. 

Under the full-feedback model, we then get 


^ (32) 


where the second step follows from the fact that in deterministic layered networks, ^ function of 

the third step follows from the fact that second last step is 

due to iterating the previous steps {t — 3) times. 
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Fig. 11. Network equivalence. The feedback LIC capacity region of the full-feedback model is the same as that of the layered-feedback 
model. 


Using similar arguments, we can also show that for £ S [1 : L — 1], 

— y(r+i),i—1 

L (33) 

L (y(d.*-l^ 3^(^+l) J3j) 

— y(^+i)ii~i) 

The functional relationships of (1^ and (l3^ imply that any rate point in the full-feedhack LIC capacity region 
can also he achieved in the layered-feedhack LIC capacity region. This proves the equivalence of the two feedback 
models. See Fig. [TT] 

We next focus on the LIC capacity region characterization under the layered-feedhack model. The key idea 
is to employ Theorem |3l thus translating each layer with feedback into an equivalent nonfeedback layer, where 
replaced by in <1^ - We can then apply Theorem[T]to obtain the claimed LIC 

capacity region. ■ 
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Fig. 12. The input Xi is composed of two binary inputs and X'{, and the input X 2 is binary. The output Y\ = © X 2 , and the 

output Y 2 = X”. 


C. Multi-hop Networks with Identical Layers 

Theorem 5: Consider a multi-hop layered network in which aL = aip'il and L = 00 . For both full-feedhack 
and layered-feedhack models, the feedback LIC sum capacity is the same as 


Clum = max 


1^11)'^00) *^22)-^(<^16 )'^01))-^(<^20 )'^02))-^(*^12) <^2l)) "^('^16 )'^02) <^2l))-^('^26 )'^01)'^12 


2,fb 


(34) 


where (cr^o*^, cr^o^) are of the same formulas as those in (l29l) . 

Proof: The proof is immediate from Theorems |2l [3l and HI First, with Theorem |4l it suffices to focus on 
the layered-feedback model. We then employ Theorem [3] to translate each layer with the layered feedback into an 
equivalent nonfeedback layer with the replaced parameters cr^g*^). We can then use Theorem |2] to obtain the 

desired LIC sum capacity. ■ 

We see from Example 6 that the LIC sum capacity does not increase with feedback in a single-hop network. On 
the other hand, in multi-hop networks, we find fhaf the LIC sum capacity can increase with feedback. Here is an 
example. 

Example 7: Consider a multi-hop layered network in which each layer is the interference channel shown in 
Fig. [m Tx 1 has two binary inputs X'^ and X", and Tx 2 has one binary input. The output Yi is equal to 0 X 2 
and the output Y 2 is equal to X'{. Suppose that Px^ is fixed as [0.1585,0.8415], and Px^ = Px[X'^ is fixed as 


f 0.095, = (00,01); 

\ 0.405, X[X'{ = (10,11). 


Then, we have 


= (0.35,1,0.26), 

(<^ 21 ) *^22!'^ 20 ) ~ (0.25,0,0), 

K\,^02 ,<^oo) = (0.6,1,0.375). 

From Theorem m the nonfeedback LIC sum capacity is computed as Csum = ^cr 21 ) = 0.4. On the other 
hand, (c^q^, cr^o*’) = (0.375,0.2) and from Theorem [5j the feedback LIC sum capacity is computed as Cijurn = 
Af(fj^g*’, Cq^) = 0.4615, thus showing a 15.4% improvement. □ 

We also find some classes of symmetric multi-hop layered networks, where feedback provides no gain in LIC 
sum capacity. 

Corollary 2: Consider a two-source two-destination symmetric multi-hop layered network, where 

\_ 2 _ 2 _ 2 _ 2 
A .— Gii — a 12 — <^21 ~ *^22) 

2 2 

P := UiQ = lT2g, 

a := Uqi = f7Q2, 

2 

'^00- 

Assume that the parameters of (A,/r, rr, Ugg) satisfy ([T^ . We then get: 

C'sum = = max{A,f7gg, Af(/i,f7), Ar(^,A,o-)}. 
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Proof: Theorem |2] immediately yields Csum = max{A, cJqq, M(/i, cr), M(/r, A, a)}. From Theorem[51 we get: 


C*ciim — max AT 


(M{\a) \ 


K 


,<r ,Af 


1 - 


-,cr, A 


Note that 


M 


(M{\a) 




,cr = A 


2a 


2A + fj 


< A, 


where the inequality comes from cr < 2A due to (fT^ . Similarly we can show that M ^ ^ Therefore, 


= Csum. 


VII. Discussions 

A. Extension 

A generalization to arbitrary M-source Tf-destination networks is straightforward. In the most general setting, 
we have (2^ — 1) virtual sources, (2^ — 1) virtual destinations, and (2^ — 1)(2^ — 1) messages. For example, in 
the case of {M,K) = (3,3), 

virtual sources: si, S 2 , S 3 , S 12 , S 13 , S 23 , S 123 , 
virtual destinations: di,d 2 ,d 3 ,di 2 ,dis,d 23 ,di 23 , 

where, for instance, S 12 indicates a virtual terminal that sends messages accessihle hy sources 1 and 2 ; and di 2 
denotes a virtual terminal that decodes messages intended for destinations 1 and 2. And we have 7 x 7 = 49 
messages, denoted hy Us,v, where S,V E {l, 2 , 3 }( 7 i: 0), each indicating a message which is accessihle hy the 
set S of sources, and is intended for the set V of destinations. For this network, we can then obtain 49-dimensional 
Lie capacity regions and LIC sum capacities, as we did in Theorems [T] and |2] We can also extend to networks 
with feedback, thus obtaining the results corresponding to Theorems |4] and 5. 

An extension to cyclic networks is also straightforward. The key idea is to employ an unfolding technique which 
enables us to translate a cyclic network into an equivalent layered network. Once it is converted into a layered 
network, we can then apply the same techniques developed herein, thus obtaining similar results. 

B. Non-separation Approach & Network Coding 

In this work, we have assumed a separation scheme between layers. Only decoded messages at each node are 
forwarded to next layers. We also focused on the routing capacity, not allowing for network coding. So one future 
research direction of interest would be developing a non-separation and/or network-coding approach to explore 
whether or not it provides a performance improvement over the separation approach. 

C. Applications of the Local Geometric Approach 

In this work, we took a local geometric approach based on an approximation on KL-divergence, to address a 
class of network information theory problems which is often quite challenging. We find this approach useful for 
a variety of communication scenarios and other interesting applications. As mentioned earlier in Remark [T] one 
such communication scenario is a cognitive radio network in which the secondary users wish to exchange their 
information while minimizing interference to the existing communication network. Here one can model the encoding 
of secondary users’ signals as the superposition coding to the existing primary signals. Given the constraint on 
the interference level, the secondary users’ signals will only slightly perturb the conditional input distribution w.r.t. 
the primary signals from the original input distribution. Then, the decoder will detect the perturbation to decode 
secondary users’ messages. Therefore, our model serves to study the efficiency of exchanging information between 
secondary users through superposition coding, when the perturbation to the existing primary signals is restricted. 

In addition to communications problems, the local geometric approach can be applied to the stock market 
networks. It has been shown in l|3l that the local geometric approach plays a crucial role in finding an invesfmenf 
sfrafegy fhaf maximizes an incremental growth rate in repeated investments |[T5l . The local geometric approach has 
also been exploited to a wide range of applications in machine learning: a learning problem in graphical models Uhl, 
an inference problem in hidden Markov models ITTl . lITSl . and big networked data analytics via communication 
and information theory |[T9l . |[20l . 
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VIII. Conclusion 

In this paper, we investigate the problem of how to efficiently transmit information through discrete-memoryless 
networks, hy perturbing the given distributions of the nodes in the networks. In particular, we apply the local 
approximation technique to study this problem and construct a new type of deterministic model for multi-layer 
networks. Then, we employ this deterministic model to investigate the optimization of the throughput of multi¬ 
layer networks. Our results illustrate the optimal communication strategy for network users to optimize the efficiency 
of transmitting information through large scale networks. In addition, we also consider the multi-layer networks 
with feedback by our deterministic model. We find fhat for some classes of networks, feedback can provide insights 
of designing efficient information flows in large communication networks. 


Appendix A 
Proof of Lemma [U 

In this Appendix, we show that C'ium,e is continuous at e = 0, i.e., lim£^o+ cium.e = C^um o- squeeze 


theorem, the continuity holds if the following inequalities are established: for e > 0, 


cLl,e > ^ cLl,e - 4 max{cj. ^ afj. 

( 1 ) 


(35) 


The upper bound of (1351) is trivial from the definition of Csum.e- To show the lower bound of (1351) . we consider an 
optimal solution {(5*j }ij=[o,2] of CsLlm,e^ i-e-> an optimal solution 2] of the optimization problem: 

C'sum,e ^ max (ijjcjjj : 
hj 

s.t. < 1, 6ij > 0 




E 

k=0 


^ikO'ik 

j=0 i=0 


< £. 


If we can show that there exists a set of {6ij}ij=[Q 2 ] satisfying 

(5ij>0Vz,j 




E 

k=0 


2 2 

'y ^ ^kj^kj ~ y ^ ^ik'^ik 

j=0 i=0 


< 0 , 


(36) 


(37) 


and 


\S*j - Sij\ < 4max{cj./}e, V i,j, 


then from (1361) and (l37l) . we have Cel'll 0 — Si j Moreover, from (13^ . we get 


(38) 








— 4 maxjcr. 


-2 

ij 


FE 


cr; 


U’ 




which implies the lower bound of (l35l) . 

The idea of constructing such {(iij}ij=[o,2] is to first design each 6ij as a perturbation to S*j, such that Sij > 0 and 
satisfy (1371 ) and (13^ . Then, the resultant 6ij’s are multiplied by a normalizing factor to meet the constraint ■ 5ij < 

1. To show the design of the perturbations, we define ccfc = S]?=o ^ik^ik ~ S^=o ^kj^lj’ where ^21=0 = 0 from 

the definition. Then, since ao, ai, and 02 are symmetric w.r.t. aij, we can without loss of generality assume ao > 0, 
ai > 0, and 02 < 0. In the following, we demonstrate the constructions of ^=[9 2] for the cases of cr2o and 
CJ21 being zero or nonzero: 

(1) <720 / 0, C721 / 0: 






27 


In this case, we first design 820 and <521 as + ^ 20^0 and + o' 2 i ~ ^ij * 

and j. Then, it is easy to verify that (iTTl ) is satisfied. To meet the constraint ^ • ■ 8 ij < 1, we normalize 8 , 




hy multiplying a factor (1 + crgg^ceo + o' 2 i^“i) ^ verification of (l3^ for the resultant 8 ij is 


straightforward. For example, for ^20 = (1 + ^ 20'^0 + (^ 21 ^(^20 + '^ 20 ^“o)> we have 


- 2 . 


- 2 , 


1^20 “ <^201 < 


*^20 *^21 


- 2 , 


_2 _ 2 ^20 

1 + (720 *^0 *^21 

-2 


+ 


^20 


_2 _ 2 

1 + cr 2 g CKq + (721 


< 2 max{cr.^.^}e + max{cj-/}e = 3 max{fj-/}e 


CTjj ^0 


CF ij^O 


*7 


CFij ^0 


*7 


where the first inequality is the triangle inequality, and second inequality is due to ^ 1 > and Y^k=o \^k\ = ^, 
which implies la^l < e, for all k. 

( 2 ) 0-20 / 0 , 0-21 = 0 : 

(i) (Tot 7 ^ 0 : ^ 

In this case, Jqi and ^20 are designed as (5 q^ + a^^ai and + ai)- In addition, we design 

821 as 0, and for the rest i and j, 8 ij = 8 *j. Then, it is easy to check that (lT7l) is satisfied. Moreover, 

we mulfiply each 8 ij hy a factor (1 + a^^ai + a^oiao + ai))~^ so that the constraint 8 ij < 1 is 
satisfied. To verify (l3^ . nofe fhat when aij = 0 for some (i,j), then the corresponding 8 *j = 0, since 
{ 8 ij}i,j=[o, 2 ] is an optimal solution. Thus, we have 821 = 521 = 0. The verification of (l3^ for the rest 

(5jj’s are the same as the case (1) hy noting that |q:o + ai| < |ao| + |«i| < £• 

(ii) (Toi = 0 : 

In this case, we design ^20 as 82 Q + ( 72 oOto +(7 20 ^ 10 ^ 10 ’ and 5 q]^, 5j[‘Q, 8 I 2 , 821 as 0. In addition, for the rest 
i and j, Sij = 8 *j. Then, a factor (1 + (T^Q^ao -k ( 72 o'^io^io)~^ i^ multiplied to each 8 ij for normalization. 
One can easily check that the resultant 5jj’s satisfy both (l36l) and (iTTl) . To verify (l3^ . since 1 T 21 = uoi = 0, 
we have + 0-125^2 = «!<£■ Hence, < e, which implies - (5ifc| < mayiaij^o{(7~j'^}£, 

for k = 0,2. Moreover, for ^20 = (1 + <7^o^ao + f^^o^^io'^io )~^('^20 + ^ 20^0 + ^ 20 ^ 10 ^ 10 )^ 


1^20 



-2 1-22 X* 

(720 ®0 + C20 CTlO^lO 

-9-0-o-^20 

1 + (720 *^0 + O '20 < 7 ]^ o ‘^10 
-2 I -2 2 1 :* I I I -2 

(720 ®0 + <720 <7io5io| + I*^20 


+ k20^“o| + k20^'^10'^lol 
aol + < 4 7naj{aT^}e, 

(Tij ^0 •( 


where the second inequality is due to 5^0 < and the third inequality is from |ao| < e and < 7 ^ 0^10 — 
Finally, the verification of (1381 ) for the rest 8 ij's is the same as the previous cases. 

(3) (720 = 0, (721 / 0: 

This case is symmetric to the case (2). By exchanging the role of suhindexes 20 o 21, 01 o 10, and ao ■H- ai, 
the construction is the same as the case (2), 

(4) 0-20 = 0, 0-21 = 0: 

(i) (710 7^ 0: 

In this case, if ai — (7 q2^q2 — we design 5io as + <^ro^“i “ <^ro^'^ 02 '^ 02 i otherwise, design 5oi 
as 5o^ — (jQ^^ai + ( 7 qi^( 7 q 28 q 2 . In addition, we design 820 , ^ 21 , 802 , 5i2 to 0, and for the rest i and j, 
8ij = 5T’s. We multiply a factor (1 + |fT]"g^ai — ( 71 ^ 0^2 *^021) ^ to each 8ij for normalization. Then, one 
can check that (|3^ and (1371 ) are satisfied for fhe resulfant 8 ij. Nofe fhaf since a 2 o = (721 = 0, we gef 
'^ 02'^02 + '^I 2 '^i 2 = “ 0:2 < which implies (7^2'^fc2 — 4or A: = 0,1. Thus, hy fhe same procedure as (ii) 
of fhe case (2), we can verify (1381 ). 

(ii) (710 = 0: 

In fhis case, we simply sef all fhe 8 ifs he zero. Since a 2 o = (721 = (7io = 0, we get (7g]^5g^ + ( 7 ^ 2^02 = 
ao < and (7 q2^q2 +'^ 12'^12 = “O 2 < which imply 5T < maXf^..^Q{a~j‘^}e for all i and j. Thus, (l36l ) 
to (l3^ are satisfied. 
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